6DoF INSIDE-OUT TRACKING GAME CONTROLLER INITIAL REGISTRATION

ABSTRACT

Methods and apparatus are provided for 6DoF inside-out tracking game control. In one novel aspect, a multi-processor architecture is used for VI-SLAM. In one embodiment, the apparatus obtains overlapping image frames and sensor inputs of an apparatus, wherein the sensor inputs comprise gyrometer data, accelerometer data and magnetometer data, splits computation work onto a plurality of vector processors to obtain six degree of freedom (6DoF) outputs of the apparatus based on a splitting algorithm, and performs a localization process to generate 6DoF estimations, and a mapping process to generate a cloud of three-dimensional points associated to the descriptors of the map. In one embodiment, the localization process and mapping process are configured to run sequentially. In another embodiment, the localization process and mapping process are configured to run in parallel.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part, and claims priority under 35 U.S.C. § 120 from nonprovisional U.S. patent application Ser. No. 17/075,853, entitled “6DOF INSIDE-OUT TRACKING GAME CONTROLLER”, filed on Oct. 21, 2020, the subject matter of which is incorporated herein by reference. application Ser. No. 17/075,853, in turn, claims priority under 35 U.S.C. § 120 from nonprovisional U.S. patent application Ser. No. 15/874,842, entitled “6DOF INSIDE-OUT TRACKING GAME CONTROLLER”, filed on Jan. 18, 2018, the subject matter of which is incorporated herein by reference. application Ser. No. 15/874,842, in turn, claims priority under 35 U.S.C. § 119 from U.S. Provisional Application No. 62/447,867 entitled “A MULTI AGENT STEROSCOPIC CAMERA BASED POSITION AND POSTURE TRACKING SYSTEM FOR PORTABLE DEVICE” filed on Jan. 18, 2017, the subject matter of which is incorporated herein by reference.

TECHNICAL FIELD

The present invention relates generally to VR/AR system, more particularly relates to 6DoF inside-out tracking game controller, head mount device in-side-out tracking and multi agent interaction; robot position and posture tracking, routing planning, collision avoidance.

BACKGROUND

The virtual reality (VR) and augmented reality (AR) is expected to continue to grow rapidly. With the development of new technology in both hardware and software could help the AR/VR market to grow even faster. With more applications using the technology the requirement for the system to run faster, be more accurate and without any drift for the localization.

The SLAM (simultaneous localization and mapping) algorithm is widely adopted to improve the system. However, there are three issues using the SLAM algorithm: the scale factor, the drift problem (even with a stereo camera), and the long processing time. The state of the art solutions for the drift are the on-line loop-closure and on-line re-localization (used in ORB-SLAM). Both are based on a bag of words approach (to store every patches). But the update of this bag of words is very time/CPU consuming.

Further, six degree of freedom (6DoF) data of a game controller are needed for the AR/VR system. However, the game controllers today are not efficient and fast enough to produce the 6DoF data in real time. The three dimensions for the translation in the 3D space are not obtained by the game controller of the market.

Enhancement and improvement are required tracking game controller.

SUMMARY

Methods and apparatus are provided for 6DoF inside-out tracking game control. In one novel aspect, a multi-processor structure is used for VI-SLAM. In one embodiment, the apparatus obtains overlapping image frames and sensor inputs of an apparatus, wherein the sensor inputs comprise gyrometer data, accelerometer data and magnetometer data, splits computation work onto a plurality of vector processors to obtain six degree of freedom (6DoF) outputs of the apparatus based on a splitting algorithm, and performs a localization process to generate 6DoF estimations, and a mapping process to generate a cloud of three-dimensional points associated to the descriptors of the map. In one embodiment, the splitting algorithm involves: dividing a current frame in N equal part; and each of a set of selected vector processors processes a portion of the current frame based on a split-by-corner rule, and wherein the split-by-corner rule determining whether each pixel of is a corner and classifying each pixel determined to a corner to a compressed descriptor by converting each sub-image centered by the pixel to a 16-float descriptor using a base matrix. In one embodiment, the localization process and mapping process are configured to run sequentially, wherein the localization process is split over all of the vector processors and the mapping process is split over all the vector processors. In another embodiment, the localization process and mapping process are configured to run in parallel, wherein the localization process is split over a first subset of the vector processors and the mapping process is split over the rest subset of the vector processors. In one embodiment, the 6DoF outputs is in one format selecting from an output format group comprising: six floating point values with three for the translated 3D space and three for the rotation space, twelve floating point values with three for the translated 3D space and nine for the rotation space, six fix point values with three for the translated 3D space and three for the rotation space, and twelve fix point values with three for the translated 3D space and nine for the rotation space.

In one novel aspect, a map of the background environment is generated in advance. This reference map is a batch of visual features with pre-estimated 3D position and visual feature description. The map is used for real-time localization. During the localization process, the 3D position of the features is not updated, so the map is static. Because the map is known, there is no need to map the environment constantly. And because the map is static, the localization will not drift. The potential issue of this approach is a failure of the localization when we move too far from the reference map. We solve this problem using a light SLAM algorithm.

In one embodiment, client server topology is used in deploying the mapping and localization technology, which makes the client lighter in computing and less power hungry. There could be one or more clients working on the server network. Or the client works on its own without a server at the cost of power consumption.

In another embodiment, tracking and localization are based on a known map. This allows to achieve fast processing speed. This is useful for the VR/AR application. A calibrated stereo camera is provided in this approach to fix the scale factor problem.

Other embodiments and advantages are described in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, where like numerals indicate like components, illustrate embodiments of the invention.

FIG. 1 illustrates an exemplary block diagram of a 6DoF inside-out tracking game controller in accordance with embodiments of the current invention.

FIG. 2 illustrates exemplary multi-camera configurations in accordance with embodiments of the current invention.

FIG. 3 illustrates an exemplary data flow for the real time stereo tracking for the game controller in accordance with embodiments of the current invention.

FIG. 4 illustrates an exemplary location system data flow for the real time stereo tracking for the game controller in accordance with embodiments of the current invention.

FIG. 5A illustrates exemplary flow diagram of splitting the VI-SLAM onto N vector processor for the localization process based on a splitting algorithm in accordance with embodiments of the current invention.

FIG. 5B illustrates exemplary flow diagram of splitting the VI-SLAM onto N vector processor for the mapping process based on a splitting algorithm in accordance with embodiments of the current invention.

FIG. 6 illustrates an exemplary parallel and an exemplary sequential process in the multi-processor configuration in accordance with embodiments of the current invention.

FIG. 7 illustrates exemplary diagram of a calibration process in accordance with embodiments of the current invention.

FIG. 8 illustrates an exemplary flow chart for the 6DoF inside-out tracking process in accordance with embodiments of the current invention.

FIG. 9 illustrates exemplary diagrams for registering initial 6DoF pose in an HMD tracking system in accordance with embodiments of the current invention.

FIG. 10 illustrates exemplary diagrams of registering initial 6DoF pose using a physical button on a game controller in an HMD tracking system in accordance with embodiments of the current invention.

FIG. 11A illustrates exemplary diagrams of registering initial 6DoF pose using a two-dimensional (2D) 6DoF tag in an HMD tracking system in accordance with embodiments of the current invention.

FIG. 11B illustrates exemplary diagrams of registering initial 6DoF pose by attaching a 2D 6DoF tag on the HMD in accordance with embodiments of the current invention.

FIG. 11C illustrates exemplary diagrams of registering initial 6DoF pose by attaching a 2D 6DoF tag on the game controller in accordance with embodiments of the current invention.

FIG. 12 illustrates an exemplary flow chart of registering initial 6DoF pose in an HMD system in accordance with embodiments of the current invention.

DETAILED DESCRIPTION

Reference will now be made in detail to some embodiments of the invention, examples of which are illustrated in the accompanying drawings.

FIG. 1 illustrates an exemplary block diagram of a 6DoF inside-out tracking game controller 100 in accordance with embodiments of the current invention. Game controller 100 is not limited to function as a game controller only. It can be an apparatus used in other scenarios and in combination with other apparatus. In one novel aspect, the 6DoF is produced in real time by game controller 100. In one embodiment, game controller 100 includes a plurality of sensors, such as sensor 121, 122, and 125. In one embodiment, the sensors are cameras. A plurality of cameras for game controller 100 generate overlapping images.

Game controller 100 also includes an inertial measurement unit (IMU) 131, an optional external memory card (SD Card) 132 Other embodiments and advantages are described in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims, and one or more wireless interface 133, such as a WiFi interface, a Bluetooth interface. An interface module 111 communicates and controls the sensors, IMU 131, SD 132, and the wireless interface, such WiFi 133 and Bluetooth 134. A hardware accelerator and image signal processing unit 112 helps image processing of the sensor inputs. IMU 131 detects of movements and rotations and magnetic heading of game controller 100. In one embodiment, IMU 131 is an integrated 9-axis sensor for the detection of movements and rotations and magnetic heading. It comprises a triaxial, low-g acceleration sensor, a triaxial angular rate sensor and a triaxial geomagnetic sensor. IMU 131 senses orientation, angular velocity, and linear acceleration of game controller 100. In one embodiment, game controller 100 processes data of an IMU frame rate of at least 500 Hz.

In one embodiment, a plurality of cameras are mounted on the outer case of the game controller to generate overlapping views for the game controller. Using multiple cameras with overlapping view has many advantages compared to monocular solution, such as the scale factor of the 3D motion does not drift, the 3D points seen on the overlapping area can be triangulated without a motion of the device, the matching on the overlapping area is faster and more accurate using the epipolar geometry, the global field of view is wider which increase the accuracy and reduce the jittering.

FIG. 2 illustrates exemplary multi-camera configurations in accordance with embodiments of the current invention. The multi-camera configurations are not limited by the examples shown. In general, a plurality of cameras can be used to capture stereo images such that overlapping images are obtained. A game controller 210 has a straight outer case. Two cameras 211 and 211 are mounted at close to the two-ends of the outer case and the lens of the cameras are parallel with the surface of the outer case. A game controller 220 is similarly configured as game controller 210 with two cameras mounted at close to the edge of the straight line outer case. In other scenarios, four cameras are used. A game controller 230 has a straight outer case. Four cameras 231, 232, 233, and 234 are mounted at close to the edge of the outer case. The cameras that are on the same side but opposite to each other are not inline. Similarly, to game controller 230, game controller 240 also has a straight outer case with four cameras 241, 242, 243, and 244 mounted at close to the edge of the outer case. The cameras that are on the same side but opposite to each other are inline. A game controller 250 has a square outer case with four cameras 251, 252, 253, and 254. Each of the cameras is mounted on one side of the outer case. A game controller 260 is a “V” shaped outer case. Four cameras 261, 262, 263, and 264 are mounted on the outer case of game controller 260, with two of each the cameras on each leg of the outer case. In another example, a game controller 270 has an outer case in the shape of a half hexagon. Game controller 270 has four cameras with two of each mounted on the two outer legs of the outer case. Other configurations and/or different number of cameras can be configured.

FIG. 3 illustrates an exemplary data flow for the real time stereo tracking for the game controller in accordance with embodiments of the current invention. Two processes are used, one localization process 312 for the real-time tracking, and one mapping process 311 for the map update. One or more stereo cameras 341 are used. The obtained images are gather in one frame 331, which through interface A is passed to the localization process 312. Similarly, IMU unit 342 talks to accelerometer, gyrometer and magnetometer 332, which through interface B is passed to localization process 312. The localization procedure generates 6DoF pose, IMU, and feature points 322 and passes through interface G of mapping and loop closure process 311. Mapping and loop closure 311 receives 6DoF pose and IMU and feature points through interface G. Mapping and loop closure 311 generates descriptors and 3D points 321 and sends it to localization 312 via its own interface F and the localization interface C. In one embodiment, mapping and loop closure 311 also generates 6DoF pose for the game controller through its interface H. FIG. 4 illustrates an exemplary location system data flow for the real time stereo tracking for the game controller in accordance with embodiments of the current invention. The interfaces A, B, C, D, and E each corresponds to the same interfaces as marked in our FIG. 3 . A feature detection procedure 411 receives stereo images from interface A and descriptors base 451. Features detection 411 detects features 452 and passes to match with local map procedure 412. Procedure 412 also receives descriptors and 3D points 462 from local map 461. Procedure 461 generates data 462 based on input from interface C, which are the outputs from the mapping procedure. Procedure 412 generates 2D and 3D points 453 of the matching result and passes to a compute pose procedure 413. The localization procedure also receives IMU inputs from interface B and passes a pose prediction procedure 414. Procedure 414 further receives 6DoF pose 465. 6DoF pose 465 is generated by current pose 464 based on inputs from interface E. Compute pose procedure 454 based on input 453 and 454 compares the result with a threshold at step 421. If step 421 determines that the number of inliers are greater than the threshold, current pose 463 is passed to current pose procedure 464. If step 421 determines that the number of inliers are not greater than the threshold, the 6DoF pose and IMU and feature points are passed to the mapping process through interface D.

In one novel aspect, the VI-SLAM algorithm is split to run on a plurality of processors based on a splitting algorithm and the sensor inputs.

FIG. 5A illustrates exemplary flow diagram of splitting the VI-SLAM onto N vector processors for the localization process based on a splitting algorithm in accordance with embodiments of the current invention. In one embodiment, the localization process is a pipeline implemented as a state machine including a feature extraction procedure 510, a matching procedure 520, and a 6DoF estimation procedure 530.

In one embodiment, the feature detection and extraction procedure 510 is split to be run on N vector processors following the splitting rule. Step 511 divides the current frame to be processes into N equals part. Step 512 assign each frame part to a corresponding vector processor. Each processor processes one part of the frame following a predefine algorithm. First, a corner is determined. For each pixel p_(i), described by a 2D coordinate in the image, and an adjustable threshold t, p_(i) is determined to be a corner if there exist a set of K contiguous pixels in the neighbor circle, which are all brighter than (p_(i)+t) or all darker than (p_(i)−t). In some embodiment, threshold t is in the range of 5<t<200. In another embodiment, the K is in the range of 5<K<13. In yet another embodiment, the neighbor circle has a radius of three pixels. Subsequently, at the second step, each corner pixel p_(i) is classified, using a n×n sub-image centered on p_(i), to a compressed descriptor. This is done using a base matrix to convert each sub-image to a 16 floats descriptor. The base matrix is computed with a singular value decomposition on a large set of selected features. In one embodiment, the n×n sub-image is 11×11. Let P=(p₁, . . . , p_(n)) the list of features points (2D coordinate in the image) detected from the current frame. Let D=(d₁, . . . , d_(n)) the list of descriptors associated pair with each feature point with its associated descriptor.

In another embodiment, the matching procedure 520 is split onto N vector processors. Step 521 splits the descriptor list into N parts. In one embodiment, the descriptor list is equally split into N part. Step 522 performs descriptor matching for each descriptor Di by matching Di with a subset of the map descriptors. The descriptors are split in N equal range. For each vector process i, a matching algorithm applies for Di. The processor i (0<i<N+1) run the matching algorithm on the range D_(i). The descriptors D_(i) are matched with a subset of the descriptors of the map LocalMap (subset of the map), using the cross-matching method: each match is a a pair of descriptor (d_(a),d_(b)) such as d_(a) is the best candidate for d_(b) among the descriptors D_(i) of the current frame and d_(b) is the best candidate for d_(a) among the descriptors of the map LocalMap. Some of the descriptors of the map are associated to some 3D points geo-referenced in world (this 3D estimation is performed by the mapping algorithm). So the matching associates each descriptor d_(i) de D to a 3D point p3d of the LocalMap. The output of the matching is a list of descriptor pairs associating the features points P to the 3D points of the map: M_(i)=((p₁,p3d₁), . . . , (p_(n),p3d_(n))).

In yet another embodiment, estimation 6DoF procedure 530 is split onto N processors. The input of this step is the N lists M_(i) (from the matching). The 6DoF estimation minimizes, for each pair (p_(i),p3d_(i)) in M, the difference in 2D between the projected of p3d_(i) in the current frame and p_(i). This minimization is performed with the non-linear least square algorithm Levenberg-Marquardt combined with the M-Estimator (robust method) of Geman-McClure. The robust method of Levenberg-Marquard is used on N processors. Once split, each processor i computes the reprojection error of all the elements of Mi:Ei, computes the Jacobian error function of all elements of Mi:Ji. Subsequently, the total number of N Ei in E and the total number of N Ji in J are merged with concatenation. The median of the absolute different of E (MAD) is computed. The estimation of 6DoF is obtained by solving the linear system of (J^(T)J) X=J^(T)E·MAD, where X is the update of the 6DoF.

FIG. 5B illustrates exemplary flow diagram of splitting the VI-SLAM onto N vector processors for the mapping process based on a splitting algorithm in accordance with embodiments of the current invention. The mapping process takes inputs of features points, descriptors and current pose and generates a cloud of 3D points associated to the descriptors. The mapping process includes a matching procedure 560 and an optimization procedure 570. Matching procedure 560 is split onto multiple processors. The features points coming from the localization are matched with a subset of features points of the map using the cross matching method. The splitting algorithm involves: splitting the range of features to match in M equal ranges. Each range is processed using a different processor. Each match performed with a descriptor which is not already associated to a 3D point of the map allows the triangulate on of a new 3D point. Optimization procedure 570 optimizes the pose and 3D points of the map to minimize the reprojection error with the Levenberg-Marquardt algorithm. The optimization may further involve: 3D point culling wherein the 3D points seen only in 2 images are removed because they are not relevant; a key frame culling that removes the key frames (poses of the map) which contain a lot of 3D points in common because they are redundant; the loop-closure procedure detects the multiple instances of the same 3D points in the map and merge them in order to save some memory and to reduce the drift of the 3D reconstruction.

In one novel aspect, using the multi-processor processors architect, the efficiencies of the localization process and the mapping process are greatly improved.

FIG. 6 illustrates an exemplary parallel and an exemplary sequential process in the multi-processor configuration in accordance with embodiments of the current invention. Multiple processors, Processor 601, Processor 602, Processor 603, and Processor 612. In one embodiment, the localization process, such localization 611 and localization 612 are configured to run in serial on all the vector processors. Each localization process is split on all the vector processors. The mapping process, such as mapping process 612 and mapping process 619. The mapping process, such mapping 612 and mapping 619 are configured to run on all the vector processors. Each mapping process is split on all the vector processors. In another embodiment, the localization process and the mapping process are run in parallel. A localization process 621 is configured to occupy processor 601 to process 608 all the time. Each localization process uses a subset of the vector processors. The mapping process uses the rest of the vector processors, such as processor 609 to processor 12.

FIG. 7 illustrates exemplary diagram of a calibration process in accordance with embodiments of the current invention. The calibration is divided in three consecutive steps. Step 701 is the intrinsic calibration. The intrinsic calibration estimates the intrinsic parameters using a known pattern, such as a planar checkerboard pattern or a 3D pattern composed of multiple checkerboards with different position and orientation. The intrinsic calibration does not require overlapping area. This can be done before the cameras are mounted. Step 702 is the extrinsic calibration. The extrinsic calibration estimates of the rotation and translation between each camera; this require to have the intrinsic calibration and to see a known pattern in the overlapping area. This step must be done once all the cameras are mounted. The triangulation of the checkerboard seen at the same time with the cameras give the baseline between the cameras using. Step 703 is the intrinsic and extrinsic refinement using a motion. Moving the device on the 6DoF following a known and pre-determined motion (on all the translation and orientation axis) recording the frames. Then we use a structure from motion algorithm to estimate the motion and the 3D points frame the recorded frame. During this step, we optimize the intrinsic and extrinsic parameters to refine their values using a motion constraint based on the known and pre-determined motion: the optimization algorithm estimates the set of intrinsic and extrinsic parameters which are required to explain the known and pre-determined motion.

FIG. 8 illustrates an exemplary flow chart for the 6DoF inside-out tracking process in accordance with embodiments of the current invention. At step 801, the apparatus obtains overlapping image frames and sensor inputs of an apparatus, wherein the sensor inputs comprise gyrometer data, accelerometer data and magnetometer data. At step 802, the apparatus splits computation work onto a plurality of vector processors to obtain six degree of freedom (6DoF) outputs of the apparatus based on a splitting algorithm. At step 803, the apparatus performs a localization process to generate 6DoF estimations, and a mapping process to generate a cloud of three-dimensional points associated to the descriptors of the map.

Register Initial 6DoF Pose

The 6DoF tracking method provides the efficient way to track the 6DoF pose for an AR/VR system. One configuration of the AR/VR system involves tracking one device in the AR/VR. Another configuration includes a head mount device (HMD) and one or more handheld/hand-control game controllers. In an HMD tracking system, more than one 6DoF tracking devices are presented. A world coordinate system is used to coordinate the multiple devices. The world coordinate system is a common coordinate system shared by multiple devices in a system. With multiple system sharing the same coordinate system, an origin/starting point is needed.

FIG. 9 illustrates exemplary diagrams for registering initial 6DoF pose in an HMD tracking system in accordance with embodiments of the current invention. The HMD tracking system is an AR/VR system with multiple 6DoF tracking devices, including an HMD 901 and a game controller 902. In other embodiments, more devices can be included. HMD 901 includes a sensor unit and one or more processors 917 and a memory 918 coupled with the one or more processors 917. The sensor unit includes one or more sensors and/or cameras 911, 912, and 915. The sensor unit also includes an IMU 916 that detects movements, rotations, and magnetic headings of HMD 901. Game controller 902 includes a sensor unit and one or more processors 927 and a memory 928 coupled with the one or more processors 927. The sensor unit includes one or more sensors and/or cameras 921, 922, and 925. The sensor unit also includes an IMU 926 that detects movements, rotations, and magnetic headings of game controller 902. The exemplary HMD 901 and game controller 902 may include other modules or be configured with other blocks as shown in FIG. 1 . The computation and process split may be shared or separately performed for each device.

Each tracking device tracks 6DoF poses in a coordinate system 930. Each devices tracks 6DoF outputs, which include six dimensions of the apparatus including three dimensions (3D) of an orientation in a rotation space 931 and three dimensions translation in a 3D space 932. Each AR/VR tracking device, such as HMD 901 and game controller 902, obtains and maintains a map. The map is a collection of 2D and 3D points in the space relative to a coordinate system. In an AR system, both real objects and virtual objects are presented in the coordinate system. For example, real objects, such as a table 935 and a plate 936, and virtual objects, such as cheese 937, are both presented in the AR system in reference to the coordinate system 930. Each tracking devices collects local information through its sensor unit. Each map is generated based on the local information collected.

With multiple tracking devices in the HMD tracking system, each device tracks its own locale information relative to their own coordinate system, which may or may not be the same as the coordinate system 930. The purpose of the registration is to synchronize the coordinate system of each individual devices to a world coordinate system. A world coordinate system is a common coordinate system shared by multiple devices in a system. In an AR/VR environment, its origin is typically the starting position of the HMD device. In one embodiment, one base map in shared by multiple devices. One or more devices receives a base map and generates its 6DoF relative to the base map received. The base map may be generated by the HMD and shared with the game controller. In another embodiment, the base map may be generated by the game controller and shared with the HMD. The base map may also be generated remotely by other devices in the AR/VR system. A single base map is shared by multiple devices in the AR/VR system. The based map is in reference to the world coordinate system for the HMD tracking system. Each 6DoF generated for the base map from each tracking device, therefore, is in reference to the world coordinate system. In another embodiment, each device generates its map in its own coordinate system or in reference to the same coordinate system. In the first scenario, when different maps are generated in reference to the same coordinate system, the initialization process synchronize the maps such that the 6DoF generated by each device are synchronized. In the second scenario, when different maps are generated in reference to different coordinate system, the different coordinate systems of each device are synchronized at the initialization process to the world coordinate system. In other embodiments, with multiple tracking devices, the combination of the shared map and individual map/coordinate system may be used for different devices. The synchronization methods presented below apply to the combination scenarios. A world coordinate system is initialized/synchronized to be shared by multiple devices.

In one novel aspect, the virtual view of the world coordinated system 930 is streamed to one or more viewing devices 960, such as a mobile device 961 and/or a desktop/laptop 962. In one embodiment, the virtual view of 930 is streamed in real-time. In another embodiment, the virtual view is broadcasted to multiple remote viewing devices.

FIG. 10 illustrates exemplary diagrams of registering initial 6DoF pose using a physical button on a game controller in an HMD tracking system in accordance with embodiments of the current invention. In one novel aspect, a physical button is used to register the initial 6DoF pose in the world coordinate system. An exemplary game controller 1001 has a sensor unit including multiples cameras, such as camera 1101, 1102 and 1103. The sensor unit of game controller 1001 also includes an IMU detects movements, rotations, and magnetic headings of the apparatus. The sensor unit tracks series of motions of the apparatus and collects locale information of the apparatus of relative to a map of its own coordinate system. An initialization button 1108 registers an initial 6DoF pose of the apparatus, wherein the initial pose is a 3D to a predefined coordinate on the map and an initial pose of the apparatus when pressed. Game controller 1001 includes one or more processors configured to communication with the HMD system to generate series of six degree of freedom (6DoF) outputs of the apparatus based on the series of motion and locale information, wherein each 6DoF output comprises six dimensions of the apparatus including three dimensions (3D) of an orientation in a rotation space and three dimensions translation in a 3D space. The sensor unit of game controller 1001 tracks motions of the apparatus and generates pose changes relative to the predefined coordinate on the map. The 6DoF includes the 3D coordinate and the pose of the game controller. When the button 1108 is pressed, the initial 6DoF pose is registered with the world coordinate system. An initial 3D coordinate is predefined. When the button is pressed when game controller is at 1051 pose, the pose of game controller is registered and the 3D coordinate of the initial 6DoF is registered at a predefined position 1080. The initial 6DoF of game controller 1001 is registered at the initial 6DoF 1052. As an example, the predefined 3D coordinate 1080 is at center of the plate. If the game controller moves and/or rotates to 1002, with a 6DoF 1061. When the initialization button 1008 is pressed, the 3D coordinate of the game controller is registered at the same predefined 3D coordinate 1080, the center of the plate. The game controller's pose is also registered. The initial 6DoF of game controller 1002 is registered at the initial 6DoF 1062. In one embodiment, the game controller also includes a map module that obtains map data to generate the map and to register the initial pose to the map.

In another novel aspect, a 2D 6DoF tag is used to register the initial 6DoF in the HMD system.

FIG. 11A illustrates exemplary diagrams of registering initial 6DoF pose using a two-dimensional (2D) 6DoF tag in an HMD tracking system in accordance with embodiments of the current invention. The HMD tracking system in an AR/VR environment has multiple tracking devices. A 2D 6DoF tag 1101 is attached to a first device. When a sensor, such as a camera 1102 of a second device captures the 2D 6DoF tag 1101, the second device derives a 6DoF 1103 relative the map of the world coordinate system. The 6DoF 1103 is a 3D of an orientation in a rotation space and three dimensions translation in a 3D space. The second device derives {x, y, z, roll, pitch, yaw} parameters when 2D 6DoF tag 1101 is in view. In one embodiment, the 2D 6DoF tag is an April Tag.

FIG. 11B illustrates exemplary diagrams of registering initial 6DoF pose by attaching a 2D 6DoF tag on the HMD in accordance with embodiments of the current invention. In embodiment, the 2D 6DoF tag is attached to the HMD in the AR/VR system. HMD 1111 has a 2D 6DoF tag 1112 attached to its surface. A game controller 1113 with a camera, at step 1115, captures the image of the 2D 6DoF 1112 on the HMD. Game controller 1113 determines an initial relative pose to the HMD when the 6DoF tag is in the field of view of the second device, wherein a 6DoF information can be generated from the capturing of the tag. The initial 6DoF 1118 is registered in the world coordinate system 1116.

FIG. 11C illustrates exemplary diagrams of registering initial 6DoF pose by attaching a 2D 6DoF tag on the game controller in accordance with embodiments of the current invention. In one embodiment, the 2D 6DoF tag is attached to a game controller in the AR/VR system. Game controller 1123 has a 2D 6DoF tag 1122 attached to its surface. An HMD 1121 with a camera, at step 1125, captures the image of the 2D 6DoF 1122 on the game controller. HMD 1121 determines an initial relative pose to the HMD when the 6DoF tag is in the field of view of the second device, wherein a 6DoF information can be generated from the capturing of the tag. The initial 6DoF 1128 is registered in the world coordinate system 1126.

FIG. 12 illustrates an exemplary flow chart of registering initial 6DoF pose in an HMD system in accordance with embodiments of the current invention. At step 1201, the AR/VR system collects locale information of an apparatus, wherein series of six degree of freedom (6DoF) outputs of the apparatus is generated based on locale information, and wherein each 6DoF output comprising six dimensions of the apparatus including three dimensions of an orientation in a rotation space and three dimensions translated in a 3D space. At step 1202, the AR/VR system obtains a map in reference to a world coordinate system of the AR/VR system, wherein the map in reference to the world coordinate system is generated based on locale information collected by the apparatus. At step 1203, the AR/VR system initializes an initial 6DoF position of the apparatus relative to the map in reference to the world coordinate system by an initializing process selecting from pressing a physical button on an apparatus or capturing a two-dimensional (2D) 6DoF tag. At step 1204, the AR/VR system performs a localization process to generate localization information of the apparatus based on the apparatus initial position.

Although the present invention has been described in connection with certain specific embodiments for instructional purposes, the present invention is not limited thereto. Accordingly, various modifications, adaptations, and combinations of various features of the described embodiments can be practiced without departing from the scope of the invention as set forth in the claims. 

What is claimed:
 1. An apparatus operating in an augmented reality/virtual reality (AR/VR) system comprising: a sensor unit that tracks series of motions of the apparatus and collects locale information of the apparatus relative to world coordinate system shared by a plurality of devices; an initialization button that registers an initial six degree of freedom (6DoF) pose of the apparatus when pressed, wherein the initial 6DoF pose comprises a predefined three dimensions (3D) coordinate and an initial pose of the apparatus relative to the world coordinate system; and one or more processors configured to communication with the AR/VR system and to generate series of 6DoF outputs of the apparatus based on the series of motion and locale information, wherein each 6DoF output comprises six dimensions of the apparatus including 3D of an orientation in a rotation space and three dimensions translation in a 3D space.
 2. The apparatus of claim 1, wherein the sensor unit comprises one or more cameras and an inertial measurement unit (IMU), and wherein the IMU detects movements, rotations, and magnetic headings of the apparatus.
 3. The apparatus of claim 1, wherein the sensor unit tracks motions of the apparatus and generates pose changes relative to the predefined coordinate relative to the world coordinate system.
 4. The apparatus of claim 1, further comprising a map module that obtains map data to generate a base map in reference to the world coordinate system and register the initial pose to the base map in reference to the world coordinate system.
 5. The apparatus of claim 4, wherein the apparatus determines real-time 6DoF outputs based the map data and locale information relative to the initial pose.
 6. A system comprising: a head mount device (HMD) with an HMD sensor unit and one or more processors configured to communication with the system and to generate series of six degree of freedom (6DoF) outputs of the HMD relative to a world coordinate system shared by a plurality of devices; a game controller with a controller sensor unit and one or more processors configured to communication with the system and to generate series of 6DoF outputs of the controller relative to the world coordinate system; and a two dimensional (2D) 6DoF tag attached to a first device selecting from the HMD and the game controller, wherein a second device determines an initial relative pose to the first device when the 6DoF tag is in the field of view of the second device.
 7. The system of claim 6, wherein the HMD sensor unit comprises one or more HMD cameras and an HMD inertial measurement unit (IMU) and the controller sensor unit comprises one or more controller cameras and a controller IMU.
 8. The system of claim 6, wherein the 6DoF tag is an AprilTag.
 9. The system of claim 6, wherein the controller sensor unit tracks motions of the apparatus and generates relative pose changes relative to the initial relative pose.
 10. The system of claim 6, wherein the one or more processors of the HMD device is further configured to split computation work onto a plurality of vector processors to 6DoF outputs based on a splitting algorithm.
 11. A method for an augmented reality/virtual reality (AR/VR) system, comprising: collecting locale information of an apparatus, wherein series of six degree of freedom (6DoF) outputs of the apparatus is generated based on locale information, and wherein each 6DoF output comprising six dimensions of the apparatus including three dimensions of an orientation in a rotation space and three dimensions translated in a 3D space; obtaining a map in reference to a world coordinate system of the AR/VR system, wherein the map in reference to the world coordinate system is generated based on locale information collected by the apparatus; initializing an initial 6DoF position of the apparatus relative to the map in reference to the world coordinate system by an initializing process selecting from pressing a physical button on an apparatus or capturing a two-dimensional (2D) 6DoF tag; and performing a localization process to generate localization information of the apparatus based on the apparatus initial position.
 12. The method of claim 11, wherein the initializing is performed by pressing the physical button on the apparatus, and wherein the initializing registers the apparatus to a predefined coordinate on the map in reference to the world coordinate system.
 13. The method of claim 11, wherein the initializing is performed by capturing a 2D 6DoF tag, and wherein the 2D 6DoF tag is attached to a second apparatus.
 14. The method of claim 13, wherein the apparatus determines an initial relative pose to the second apparatus when the 2D 6DoF tag is in the field of view of the apparatus.
 15. The method of claim 11, further comprising: receiving real-time map data generated by the AR/VR system.
 16. The method of claim 15, wherein the apparatus determines 6DoF outputs based the received real-time map data.
 17. The method of claim 11, further comprising: splitting computation work onto a plurality of vector processors to obtain six degree of freedom (6DoF) outputs of the apparatus based on a splitting algorithm.
 18. The method of claim 17, wherein the splitting algorithm involves: dividing a current frame in N equal part; and each of a set of selected vector processors processes a portion of the current frame based on a split-by-corner rule, and wherein the split-by-corner rule determining whether each pixel of is a corner and classifying each pixel determined to a corner to a compressed descriptor by converting each sub-image centered by the pixel to a 16-float descriptor using a base matrix.
 19. The method of claim 11, wherein the localization process and mapping process are configured to run sequentially, wherein the localization process is split over all of the vector processors and the mapping process is split over all the vector processors.
 20. The method of claim 11, wherein the localization process and mapping process are configured to run in parallel, wherein the localization process is split over a first subset of the vector processors and the mapping process is split over the rest subset of the vector processors. 